-
Notifications
You must be signed in to change notification settings - Fork 8
feat: Add support for List data types #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
28f225d to
0253f70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends the Arrow integration to support List data types while refactoring nested type handling. Key changes include renaming ArrowNestedType to ArrowTypeStruct, adding a new ArrowTypeList and corresponding builders/arrays, and updating relevant tests to cover both primitive and nested lists.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| Arrow/Tests/ArrowTests/TableTests.swift | Updated closure signature and return type for struct builders |
| Arrow/Tests/ArrowTests/IPCTests.swift | Updated schema construction to use the new ArrowTypeStruct |
| Arrow/Tests/ArrowTests/ArrayTests.swift | Added tests for ListArray functionality with both primitive and nested lists |
| Arrow/Sources/Arrow/ProtoUtil.swift | Refactored type creation for struct and added support for list fields |
| Arrow/Sources/Arrow/ArrowWriter.swift | Updated list handling by replacing ArrowNestedType with ArrowTypeStruct |
| Arrow/Sources/Arrow/ArrowType.swift | Renamed ArrowNestedType → ArrowTypeStruct and added ArrowTypeList |
| Arrow/Sources/Arrow/ArrowReaderHelper.swift & ArrowReader.swift | Added list data loading functions |
| Arrow/Sources/Arrow/ArrowBufferBuilder.swift | Replaced nested type usage and added ListBufferBuilder |
| Arrow/Sources/Arrow/ArrowArrayBuilder.swift | Introduced ListArrayBuilder and updated builder resolution |
| Arrow/Sources/Arrow/ArrowArray.swift | Added ListArray implementation with subscript and asString |
|
@abandy You may want to review this. |
Will do, thanks! |
0f11175 to
c4f7ef0
Compare
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Copilot <[email protected]>
@mgrazianoc Initial review it lgtm! I will try to find sometime later this week to go over again. |
|
Hi @abandy , do you think we could have this merged by the next weeks? Anything I can help, please let me know. |
|
@kou maybe we could consider merging this before migrating to the standard project structure? Otherwise this will need rebasing and the authors have been patiently waiting to get it merged for more than one month. |
|
OK. Could you review this before we merge this? |
willtemperley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general it looks good, but I think there should be a way to specify the nullability of primitives in the list array.
Maybe an API could be added to create a default list field type, similar to arrow-rs. e.g.:
extension ArrowField {
public init(listFieldWith type: ArrowType, isNullable: Bool) {
self.init(
name: "item",
type: type,
isNullable: isNullable
)
}
}59a4f6d to
5063daa
Compare
willtemperley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API for ArrowTypeList now allows element nullability, but this isn't being passed to the ListArrayBuilder. Even if non-nullability isn't actually enforced (a separate issue) I think the public API should allow this to be specified.
willtemperley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ArrowTypeList needs to be passed to the ListArrayBuilder to specify element nulllability.
|
@willtemperley let me know if the new changes conforms to your concerns and rationale. Thanks for being active on feedbacks |
willtemperley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
|
Thanks both! I'll merge this. |
Rationale within the changes
This PR refactors and extends support for nested types in the Arrow integration. The current implementation of
ArrowNestedTypeis tailored primarily for data structs, as seen inStructBufferBuilder. However, it lacks broader support and certain expected functionalities, such asloadStructArrayBuilder.To address this, the following improvements have been made:
ArrowNestedTypetoArrowTypeStructto align with naming conventions used elsewhere in the codebase.ArrowTypeList, including nested lists.For simplicity, instead of introducing a dedicated subtype for lists, this PR uses an interface of
[Any?]?. If this approach proves insufficient, there are more explicit alternatives that can be explored.NOTE: Work on
ArrowCExporterandArrowCImporterhas been intentionally deferred. These components require a deeper understanding of memory ownership and child parsing, and I believe it's better to be addressed in a future PR, unless it's strict necessary.What's Changed
ArrowNestedType -> ArrowTypeStruct.ArrowTypeList, including nested lists.ListArraywith basic.asStringformatting.ListArrayBuilder.ArrowArrayBuilderto support the.listtype.loadStructArrayBuilderandloadListArrayBuilder.ListBufferBuilder.ArrowReader.loadListData.makeListHolder.Are these changes tested?
Tests are included in
ArrayTests.swift. It's also working on internal applications, including integration withArrowFlight.Closes #16.